Goto

Collaborating Authors

 Monroe County


February stargazing: A planet parade comes to town

Popular Science

And why 2026 could be a big year for spotting auroras. Northern lights shine in the night sky over the landscape in northeastern Germany on January 19, 2026. Breakthroughs, discoveries, and DIY tips sent six days a week. Still, patient stargazers will be rewarded with a memorable planetary alignment. And for those readers joining us from the Southern Hemisphere, there's also the Alpha Centaurids meteor shower to look forward to.


12 ethereal images from the 2025 Northern Lights Photographer of the Year awards

Popular Science

Photographer Victor Lima used a 12 millimeter fisheye lens to take this panoramic image of the aurora borealis over the Skógafoss waterfall in Iceland. Breakthroughs, discoveries, and DIY tips sent every weekday. Watching the aurora borealis (in the Northern Hemisphere) or aurora australis (in the Southern Hemisphere) is unforgettable. Photographing them is on a whole other level. Capturing these ribbons of light as they move and twist across the night sky transforms even the darkest winter night into a surreal wonderland.


Text-VQA Aug: Pipelined Harnessing of Large Multimodal Models for Automated Synthesis

Joshi, Soham, Mishra, Shwet Kamal, Gopalakrishnan, Viswanath

arXiv.org Artificial Intelligence

Creation of large-scale databases for Visual Question Answering tasks pertaining to the text data in a scene (text-VQA) involves skilful human annotation, which is tedious and challenging. With the advent of foundation models that handle vision and language modalities, and with the maturity of OCR systems, it is the need of the hour to establish an end-to-end pipeline that can synthesize Question-Answer (QA) pairs based on scene-text from a given image. We propose a pipeline for automated synthesis for text-VQA dataset that can produce faithful QA pairs, and which scales up with the availability of scene text data. Our proposed method harnesses the capabilities of multiple models and algorithms involving OCR detection and recognition (text spotting), region of interest (ROI) detection, caption generation, and question generation. These components are streamlined into a cohesive pipeline to automate the synthesis and validation of QA pairs. To the best of our knowledge, this is the first pipeline proposed to automatically synthesize and validate a large-scale text-VQA dataset comprising around 72K QA pairs based on around 44K images.


Learning Large-Scale Poisson DAG Models based on OverDispersion Scoring

Gunwoong Park, Garvesh Raskutti

Neural Information Processing Systems

In this paper, we address the question of identifiability and learning algorithms for large-scale Poisson Directed Acyclic Graphical (DAG) models. We define general Poisson DAG models as models where each node is a Poisson random variable with rate parameter depending on the values of the parents in the underlying DAG. First, we prove that Poisson DAG models are identifiable from observational data, and present a polynomial-time algorithm that learns the Poisson DAG model under suitable regularity conditions. The main idea behind our algorithm is based on overdispersion, in that variables that are conditionally Poisson are overdispersed relative to variables that are marginally Poisson.


General Pruning Criteria for Fast SBL

Möderl, Jakob, Leitinger, Erik, Fleury, Bernard Henri

arXiv.org Machine Learning

Sparse Bayesian learning (SBL) associates to each weight in the underlying linear model a hyperparameter by assuming that each weight is Gaussian distributed with zero mean and precision (inverse variance) equal to its associated hyperparameter. The method estimates the hyperparameters by marginalizing out the weights and performing (marginalized) maximum likelihood (ML) estimation. SBL returns many hyperparameter estimates to diverge to infinity, effectively setting the estimates of the corresponding weights to zero (i.e., pruning the corresponding weights from the model) and thereby yielding a sparse estimate of the weight vector. In this letter, we analyze the marginal likelihood as function of a single hyperparameter while keeping the others fixed, when the Gaussian assumptions on the noise samples and the weight distribution that underlies the derivation of SBL are weakened. We derive sufficient conditions that lead, on the one hand, to finite hyperparameter estimates and, on the other, to infinite ones. Finally, we show that in the Gaussian case, the two conditions are complementary and coincide with the pruning condition of fast SBL (F-SBL), thereby providing additional insights into this algorithm.


Aurora over Mars gives Red Planet a green glow

Popular Science

Planetary scientists can now predict when the aurora will shine over our cosmic neighbor. An artist's impression of how the aurora might appear in the sky above the Perseverance rover. Breakthroughs, discoveries, and DIY tips sent every weekday. Just like Earth, our cosmic neighbor Mars sometimes has auroras dance across its night sky. In March 2024, NASA's Perseverance Mars rover imaged visible-light auroras for the first time during a major solar flare and coronal mass ejection.


Engineering Serendipity through Recommendations of Items with Atypical Aspects

Aditya, Ramit, Bunescu, Razvan, Nannaware, Smita, Al-Hossami, Erfan

arXiv.org Artificial Intelligence

A restaurant dinner or a hotel stay may lead to memorable experiences when guests encounter unexpected aspects that also match their interests. For example, an origami-making station in the waiting area of a restaurant may be both surprising and enjoyable for a customer who is passionate about paper crafts. Similarly, an exhibit of 18th century harpsichords would be atypical for a hotel lobby and likely pique the interest of a guest who has a passion for Baroque music. Motivated by this insight, in this paper we introduce the new task of engineering serendipity through recommendations of items with atypical aspects. We describe an LLM-based system pipeline that extracts atypical aspects from item reviews, then estimates and aggregates their user-specific utility in a measure of serendipity potential that is used to rerank a list of items recommended to the user. To facilitate system development and evaluation, we introduce a dataset of Yelp reviews that are manually annotated with atypical aspects and a dataset of artificially generated user profiles, together with crowdsourced annotations of user-aspect utility values. Furthermore, we introduce a custom procedure for dynamic selection of in-context learning examples, which is shown to improve LLM-based judgments of atypicality and utility. Experimental evaluations show that serendipity-based rankings generated by the system are highly correlated with ground truth rankings for which serendipity scores are computed from manual annotations of atypical aspects and their user-dependent utility. Overall, we hope that the new recommendation task and the associated system presented in this paper catalyze further research into recommendation approaches that go beyond accuracy in their pursuit of enhanced user satisfaction. The datasets and the code are made publicly available at https://github.com/ramituncc49er/ATARS .


Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models

Bi, Baolong, Liu, Shenghua, Wang, Yiwei, Xu, Yilong, Fang, Junfeng, Mei, Lingrui, Cheng, Xueqi

arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) mitigates hallucinations in Large Language Models (LLMs) by integrating external knowledge. However, conflicts between parametric knowledge and retrieved context pose challenges, particularly when retrieved information is unreliable or the model's internal knowledge is outdated. In such cases, LLMs struggle to determine whether to rely more on their own parameters or the conflicted context. To address this, we propose **CK-PLUG**, a plug-and-play method for controlling LLMs' reliance on parametric and contextual knowledge. We introduce a novel knowledge consistency metric, Confidence Gain, which detects knowledge conflicts by measuring entropy shifts in token probability distributions after context insertion. CK-PLUG then enables fine-grained control over knowledge preference by adjusting the probability distribution of tokens with negative confidence gain through a single tuning parameter. Experiments demonstrate CK-PLUG's ability to significantly regulate knowledge reliance in counterfactual RAG scenarios while maintaining generation fluency and knowledge accuracy. For instance, on Llama3-8B, memory recall (MR) of RAG response can be adjusted within a broad range (9.9%-71.9%), compared to the baseline of 42.1%. Moreover, CK-PLUG supports adaptive control based on the model's confidence in both internal and external knowledge, achieving consistent performance improvements across various general RAG tasks. Our code is available at: $\href{https://github.com/byronBBL/CK-PLUG}{\text{this https URL}}$.


Satyrn: A Platform for Analytics Augmented Generation

Sterbentz, Marko, Barrie, Cameron, Shahi, Shubham, Dutta, Abhratanu, Hooshmand, Donna, Pack, Harper, Hammond, Kristian J.

arXiv.org Artificial Intelligence

Large language models (LLMs) are capable of producing documents, and retrieval augmented generation (RAG) has shown itself to be a powerful method for improving accuracy without sacrificing fluency. However, not all information can be retrieved from text. We propose an approach that uses the analysis of structured data to generate fact sets that are used to guide generation in much the same way that retrieved documents are used in RAG. This analytics augmented generation (AAG) approach supports the ability to utilize standard analytic techniques to generate facts that are then converted to text and passed to an LLM. We present a neurosymbolic platform, Satyrn that leverages AAG to produce accurate, fluent, and coherent reports grounded in large scale databases. In our experiments, we find that Satyrn generates reports in which over 86% accurate claims while maintaining high levels of fluency and coherence, even when using smaller language models such as Mistral-7B, as compared to GPT-4 Code Interpreter in which just 57% of claims are accurate.


Unveil the Duality of Retrieval-Augmented Generation: Theoretical Analysis and Practical Solution

Xu, Shicheng, Pang, Liang, Shen, Huawei, Cheng, Xueqi

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) utilizes retrieved texts to enhance large language models (LLMs). However, studies show that RAG is not consistently effective and can even mislead LLMs due to noisy or incorrect retrieved texts. This suggests that RAG possesses a duality including both benefit and detriment. Although many existing methods attempt to address this issue, they lack a theoretical explanation for the duality in RAG. The benefit and detriment within this duality remain a black box that cannot be quantified or compared in an explainable manner. This paper takes the first step in theoretically giving the essential explanation of benefit and detriment in RAG by: (1) decoupling and formalizing them from RAG prediction, (2) approximating the gap between their values by representation similarity and (3) establishing the trade-off mechanism between them, to make them explainable, quantifiable, and comparable. We demonstrate that the distribution difference between retrieved texts and LLMs' knowledge acts as double-edged sword, bringing both benefit and detriment. We also prove that the actual effect of RAG can be predicted at token level. Based on our theory, we propose a practical novel method, X-RAG, which achieves collaborative generation between pure LLM and RAG at token level to preserve benefit and avoid detriment. Experiments in real-world tasks based on LLMs including OPT, LLaMA-2, and Mistral show the effectiveness of our method and support our theoretical results.